Detected object based image and video effects selection

ABSTRACT

A method of applying an image effect based on recognized objects involves capturing an imaging area comprising at least one object as an image stream through operation of an image sensor. The method recognizes the at least one object in the image stream through operation of an object detection engine. The method communicates at least one correlated image effect control to an image processing engine, in response to the at least one object comprising an optical label. The method communicates at least one matched image effect control to the image processing engine, in response to receiving at least a labeled image stream at an image effect matching algorithm from the object detection engine. The method generates a transformed image stream displayable through a display device by applying at least one image effect control to the image stream through operation of the image processing engine.

BACKGROUND

The aesthetic appeal of a digital photo or video may be enhanced with avariety of different imaging effects. These imaging effects may includeadjustments/corrections for color, contrast, brightness, and etc.,stylistic filters such as grayscale filters, sepia filters, blurfilters, etc., as well as enhancement effects such as object linkedaugmentation effects where an digital object or mask is added to thedigital photo or video for an identified object, and distortion effectsthat alter the appearance of the identified objects within the digitalphoto or video. These imaging effects may be applied and viewed in realtime by users before capturing a digital photo or recording the digitalvideo.

While these imaging effects may be utilized by any user attempting tocapture or record a digital photo or video, selecting the appropriateand/or most aesthetically appealing imaging effect for a particularscene can be difficult for a novice. In many cases, the appropriateand/or most aesthetically appealing imaging effect may be highlysubjective and dependent on a variety of elements such as the subjectsand the composition. Therefore, a need exists for automaticallyselecting imaging effects applied to digital photos or videos.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

To easily identify the discussion of any particular element or act, themost significant digit or digits in a reference number refer to thefigure number in which that element is first introduced.

FIG. 1 illustrates a system 100 in accordance with one embodiment.

FIG. 2 illustrates an image effect repository 200 in accordance with oneembodiment.

FIG. 3 illustrates a system 300 in accordance with one embodiment.

FIG. 4 illustrates a method 400 in accordance with one embodiment.

FIG. 5 illustrates a method 500 in accordance with one embodiment.

FIG. 6 illustrates a method 600 in accordance with one embodiment.

FIG. 7 illustrates a convolutional neural network 700 in accordance withone embodiment.

FIG. 8 illustrates a convolutional neural network layers 800 inaccordance with one embodiment.

FIG. 9 illustrates a VGG net 900 in accordance with one embodiment.

FIG. 10 illustrates a convolution layer filtering 1000 in accordancewith one embodiment.

FIG. 11 illustrates a pooling layer function 1100 in accordance with oneembodiment.

FIG. 12 illustrates an AR device logic 1200 in accordance with oneembodiment.

FIG. 13 illustrates an AR device 1300 that may implement aspects of themachine processes described herein.

FIG. 14 illustrates an augmented reality device logic 1400 in accordancewith one embodiment.

DETAILED DESCRIPTION

“Virtual reality” refers to the computer-generated simulation of athree-dimensional environment that can be interacted with in a seeminglyreal or physical way by a person using special electronic equipment,such as a headset with a display and gloves fitted with sensors.

“Augmented reality” refers to technology that superimposescomputer-generated imagery on a user's view of the real world, thusproviding a composite view.

“Virtualize” refers to converting a physical thing to acomputer-generated simulation of that thing.

“Engine” refers to logic that inputs signals that affect internalprocesses of the logic to generate deterministic outputs, typically in amanner optimized for efficiency and speed (vs. size or machine resourceutilization).

“Correlator” refers to a logic element that identifies a configuredassociation between its inputs. One examples of a correlator is a lookuptable (LUT) configured in software or firmware. Correlators may beimplemented as relational databases. An example LUT correlator is:llow_alarm_condition llow_threshold_value|0∥safe_condition|safe_lower_bound|safe_upper_bound∥high_alarm_condition|high_threshold_value|0|Generally,a correlator receives two or more inputs and produces an outputindicative of a mutual relationship or connection between the inputs.Examples of correlators that do not use LUTs include any of a broadclass of statistical correlators that identify dependence between inputvariables, often the extent to which two input variables have a linearrelationship with each other. One commonly used statistical correlatoris one that computes Pearson's product-moment coefficient for two inputvariables (e.g., two digital or analog input signals). Other well-knowncorrelators compute a distance correlation, Spearman's rank correlation,a randomized dependence correlation, and Kendall's rank correlation.Many other examples of correlators will be evident to those of skill inthe art, without undo experimentation.

“QR code” refers to a matrix barcode (two-dimensional bar code)comprising (typically) black modules (e.g., square dots) arranged in asquare grid on a (typically) white background. The information encodedmay be made up of four standardized types (“modes”) of data (numeric,alphanumeric, byte/binary, Kanji) or, through supported extensions,virtually any type of data.

A system and method for selecting imaging effects based on detectedobjects manipulates an image or video based on detected informationinside the image or video. The first part is a database on a server oroffline device that can store a list of visual effects includingadjustments in color, shape, and lighting for a particular subject or anentire image. Each effect may be selected based on the identification ofan optical label, such as a quick response (QR) code, or a barcodevisible by the imaging device. The system and method may also provide analgorithm to match an imaging effect to recognized objectclassifications (i.e., person, place, things, etc.), spatial arrangementof objects, presence of optical labels, and combinations thereof by thecamera or the photo application of a device with an image sensor, andthen provide the content of the effects such as color, shape, orlighting adjustment to the camera or a photo application.

A method of applying an image effect based on recognized objectsinvolves capturing an imaging area comprising at least one object as animage stream through operation of an image sensor. The method recognizesthe at least one object in the image stream through operation of anobject detection engine. The method communicates at least one correlatedimage effect control to an image processing engine, in response to theat least one object comprising an optical label. The method communicatesat least one matched image effect control to the image processingengine, in response to receiving at least a labeled image stream at animage effect matching algorithm from the object detection engine. Themethod generates a transformed image stream displayable through adisplay device by applying at least one image effect control to theimage stream through operation of the image processing engine.

The object detection engine may be a trained artificial neural networkfor recognizing the at least one object in the image stream.

The method of applying an image effect based on recognized objects mayinvolve receiving the image stream from the image sensor at the objectdetection engine. The method may detect the optical label in the imagestream and extract an embedded identifier from the optical label throughoperation of the object detection engine. The method may identify the atleast one image effect control in an image effects repository throughcomparison of the embedded identifier to optical label identifiers in acorrelation table through operation of an optical label correlator.

The method of applying an image effect based on recognized objects mayinvolve receiving the image stream from the image sensor at the objectdetection engine. The method may detect at least one recognized objectin the image stream through operation of the object detection engine.The method may generate the labeled image stream by identifying eachrecognized object within the image stream with an object classificationlabel and by identifying an object environment and object arrangementwithin the image stream as a scene with a scene classification label.The method may operate the image effect matching algorithm to match theoptical label, the scene classification label, object classificationlabel, or combinations thereof with at least one corresponding imageeffect control from an image effects repository.

In some configurations, the image effect matching algorithm may utilizea trained artificial neural network to match the optical label, thescene classification label, object classification label, or combinationsthereof with the at least one corresponding image effect control fromthe image effects repository.

In some configurations, the at least one image effect control may be afilter effect control, an object linked augmentation effect control, adistortion effect control, an overlay control, and combinations thereof.

A non-transitory computer-readable storage medium, the computer-readablestorage medium including instructions that when executed by a computer,may cause the computer to capture an imaging area comprising at leastone object as an image stream through operation of an image sensor. Thecomputer may then recognize the at least one object in the image streamthrough operation of an object detection engine. The computer may thencommunicate at least one correlated image effect control to an imageprocessing engine in response to the at least one object comprising anoptical label. The computer may then communicate at least one matchedimage effect control to the image processing engine in response toreceiving at least a labeled image stream at an image effect matchingalgorithm from the object detection engine. The computer may thengenerate a transformed image stream displayable through a display deviceby applying at least one image effect control to the image streamthrough operation of the image processing engine.

In some configurations, the object detection engine may be a trainedartificial neural network for recognizing the at least one object in theimage stream.

The instructions may further configure the computer to receive the imagestream from the image sensor at the object detection engine. Thecomputer may then detect the optical label in the image stream andextract an embedded identifier from the optical label through operationof the object detection engine. The computer may then identify the atleast one image effect control in an image effects repository throughcomparison of the embedded identifier to optical label identifiers in acorrelation table through operation of an optical label correlator.

The instructions may further configure the computer to receive the imagestream from the image sensor at the object detection engine. Thecomputer may then detect at least one recognized object in the imagestream through operation of the object detection engine. The computermay then generate the labeled image stream by identifying eachrecognized object within the image stream with an object classificationlabel and by identifying an object environment and object arrangementwithin the image stream as a scene with a scene classification label.The computer may then operate the image effect matching algorithm tomatch the optical label, the scene classification label, objectclassification label, or combinations thereof with at least onecorresponding image effect control from an image effects repository.

In some configurations, the image effect matching algorithm may utilizea trained artificial neural network to match the optical label, thescene classification label, object classification label, or combinationsthereof with the at least one corresponding image effect control fromthe image effects repository.

In some configurations, the at least one image effect control may be afilter effect control, an object linked augmentation effect control, adistortion effect control, an overlay control, and combinations thereof.

A computing apparatus may comprise a processor and a memory storinginstructions that, when executed by the processor, configure thecomputing apparatus to capture an imaging area comprising at least oneobject as an image stream through operation of an image sensor. Thecomputing apparatus may then recognize the at least one object in theimage stream through operation of an object detection engine. Thecomputing apparatus may then communicate at least one correlated imageeffect control to an image processing engine in response to the at leastone object comprising an optical label. The computing apparatus may thencommunicate at least one matched image effect control to the imageprocessing engine in response to receiving at least a labeled imagestream at an image effect matching algorithm from the object detectionengine. The computing apparatus may then generate a transformed imagestream displayable through a display device by applying at least oneimage effect control to the image stream through operation of the imageprocessing engine.

In some configurations, the object detection engine may be a trainedartificial neural network for recognizing the at least one object in theimage stream.

The instructions may further configure the apparatus to receive theimage stream from the image sensor at the object detection engine. Theapparatus may then detect the optical label in the image stream andextract an embedded identifier from the optical label through operationof the object detection engine. The apparatus may then identify the atleast one image effect control in an image effects repository throughcomparison of the embedded identifier to optical label identifiers in acorrelation table through operation of an optical label correlator.

The instructions may further configure the apparatus to receive theimage stream from the image sensor at the object detection engine. Theapparatus may detect at least one recognized object in the image streamthrough operation of the object detection engine. The apparatus maygenerate the labeled image stream by identifying each recognized objectwithin the image stream with an object classification label and byidentifying an object environment and object arrangement within theimage stream as a scene with a scene classification label. The apparatusmay operate the image effect matching algorithm to match the opticallabel, the scene classification label, object classification label, orcombinations thereof with at least one corresponding image effectcontrol from an image effects repository.

In some configurations, the image effect matching algorithm may utilizea trained artificial neural network to match the optical label, thescene classification label, object classification label, or combinationsthereof with the at least one corresponding image effect control fromthe image effects repository.

In some configurations, the at least one image effect control may be afilter effect control, an object linked augmentation effect control, adistortion effect control, an overlay control, and combinations thereof.

In some configurations, a trained artificial neural network utilized bythe object detection engine and the image effect matching algorithm maybe a type of convolutional neural network (CNN).

Convolutional neural networks (CNNs) are particularly well suited toclassifying features in data sets modelled in two or three dimensions.This makes CNNs popular for image classification, because images can berepresented in computer memories in three dimensions (two dimensions forwidth and height, and a third dimension for pixel features like colorcomponents and intensity). For example, a color JPEG image of size480×480 pixels can be modelled in computer memory using an array that is480×480×3, where each of the values of the third dimension is a red,green, or blue color component intensity for the pixel ranging from 0 to255. Inputting this array of numbers to a trained CNN will generateoutputs that describe the probability of the image being a certain class(0.80 for cat, 0.15 for dog, 0.05 for bird, etc.) Image classificationis the task of taking an input image and outputting a class (a cat, dog,etc.) or a probability of classes that best describes the image.

Fundamentally, CNNs input the data set, pass it through a series ofconvolutional transformations, nonlinear activation functions (e.g.,RELU), and pooling operations (downsampling, e.g., maxpool), and anoutput layer (e.g., softmax) to generate the classifications.

FIG. 1 illustrates a system 100. The system 100 comprises an imagingarea 108, an image sensor 104, a display device 102, an image effectmatching algorithm 120, an image processing engine 114, an optical labelcorrelator 116, an image effects repository 122, and an object detectionengine 106. The imaging area 108 is captured by the image sensor 104 togenerate an image stream 128. The image stream 128 is a continuous imagesignal of the imaging area 108 that is communicated to the imageprocessing engine 114 and the object detection engine 106. The imagingarea 108 comprises at least one object 110 within an object environment130. The at least one object 110 may include an optical label 112. Theobject detection engine 106 runs an object detection algorithm utilizinga convolutional neural network (CNN) on the image stream 128 to identifythe presence of the at least one object 110. Once the object detectionengine 106 determines the presence of recognizable objects within theimage stream 128, the object detection engine 106 may utilize an objectclassification algorithm that also utilizes a CNN to identify the atleast one object 110 with an object classification label and the objectenvironment 130, and the object arrangement within the image stream 128as scene with a scene classification label. The labeled image stream 138may then be communicated to the image effect matching algorithm 120.

In some instances, the object detection engine 106 may identify anoptical label 112 within the image stream 128. The optical label 112 mayinclude an embedded identifier that may be referenced within acorrelation table 118 to a corresponding image effect control 134 withinthe image effects repository 122. The object detection engine 106 maycommunicate an embedded identifier 140 to the optical label correlator116 to reference the correlation table 118 and identify a correspondingan image effect control 134 within the image effects repository 122. Theoptical label correlator 116 may communicate the correlated image effectcontrol 124 to the image processing engine 114. The optical labelcorrelator 116 may communicate information about the image effectcontrol 134, the optical label 112, and/or the embedded identifier 140to the image effect matching algorithm 120 for identifying additionalimage effect controls when viewed in combination with the labeled imagestream 138.

The image effect matching algorithm 120 matches the optical label 112,the scene classification label, object classification label, orcombinations thereof with at least one corresponding image effectcontrol 136 from an image effects repository. Once a matched imageeffect control 126 is identified, the image effect matching algorithm120 communicates the matched image effect control 126 to the imageprocessing engine 114. The image processing engine 114 transforms theimage stream 128 with the matched image effect control 126 and/or thecorrelated image effect control 124 to generate a transformed imagestream 132 displayable through the display device 102.

The system 100 may be operated in accordance with the process describedin FIG. 4, FIG. 5, and FIG. 6.

FIG. 2 illustrates an image effect repository 200 comprising a filtereffect controls 202, an object linked augmentation effect controls 204,a distortion effect controls 206, and an overlay control 208.

The filter effect controls 202 are image post processing effects thatmodify channel values of the image stream. The filter effect controls202 may be a set of parameters that modify pixel channels based on thespecific values within the pixel channels and/or thedistribution/relational positioning of the pixel channel with specificvalues. Examples of image filters include blur filters, brightnessfilters, contrast filters, grayscale filters, hue filters, color channelfilters, saturation filters, sepia filters, spatial lowpass filters,spatial highpass filters, fourier representation filters, fourierlowpass, fourier highpass, etc.

The filter effect control may be configured as a CSS filter. An exampleof the CSS filter is provided below.

   .blur {    -webkit-filter: blur(4px);    filter: blur(4px);    }  .brightness {    -webkit-filter: brightness(0.30);    filter:brightness(0.30);    }   .contrast {    -webkit-filter: contrast(180%);   filter: contrast(180%);    }   .grayscale {    -webkit-filter:grayscale(100%);    filter: grayscale(100%);    }   .huerotate {   -webkit-filter: hue-rotate(180deg);    filter: hue-rotate(180deg);    }  .invert {    -webkit-filter: invert(100%);    filter: invert(100%);   }   .opacity {    -webkit-filter: opacity(50%);    filter:opacity(50%);    }   .saturate {    -webkit-filter: saturate(7);   filter: saturate(7);    }   .sepia {    -webkit-filter: sepia(100%);   filter: sepia(100%);    }   .shadow {    -webkit-filter:drop-shadow(8px 10px green);    filter: drop-shadow(8px 10px green);   }

The object linked augmentation effect controls 204 may be digitalmanipulation of an area of the image stream corresponding to a detectedobject, a portion of the detected object, or an area overlapping oradjacent to the detected object. The digital manipulation may be theremoval of a detected object or portion of the detected object with amask or blend effect. For example, a person's face with blemishes orpimples may be detected, and the digital manipulation may remove theblemishes or pimples with a blend effect that removes the imperfection.In another example, a car that has not been washed may be detected, andthe digital manipulation would be to modify the area corresponding tothe car to appear as if it were clean. The digital manipulation may bethe addition of a digital object such as glasses, hats, etc., tocorrespond with the movement of the detected object such as a user'sface.

The distortion effect controls 206 are enhancements or modifications toan identified object in the image stream. For example, a distortioneffect controls 206 may be provided to scale or warp the identifiedobject.

The overlay controls 208 are digital manifestations (e.g., shapes, text,colors, etc.) displayed on a layer above the display of the imagestream. For example, the overlay control 208 may be a logo displayedabove the image stream.

FIG. 3 illustrates a system 300 comprising an image sensor 104, anobject detection engine 106, an optical label correlator 116, an imageeffects repository 122, an image effect matching algorithm 120, acorrelation table 118, an image processing engine 114, and a displaydevice 102. The image sensor 104 captures an imaging area 306 andcommunicates the image stream to the object detection engine 106. Theobject detection engine 106 detects objects within the image stream andidentifies the objects as a family posing (object arrangement) asrecognized objects 304 in front of a mountain (object environment 312)and assigns an object classification label for the family and a sceneclassification label for posing in front of the mountain, generating alabeled image stream 316.

The object detection engine 106 identifies that the family is standingnext to a sign with a QR code (optical label 302). The object detectionengine 106 then identifies the embedded identifier from the opticallabel 302. The embedded identifier is communicated to the optical labelcorrelator 116 which looks up the identifier in a correlation table 118to locate a corresponding image effect control in the image effectsrepository 122.

The labeled image stream 316 is communicated to the image effectmatching algorithm 120. The image effect matching algorithm 120 mayutilize the information from the optical label correlator 116 for theoptical label 302 with the information in the labeled image stream 316to identify matching image effect controls that may be utilized incombination with the image effect controls identified by the opticallabel correlator 116. The image processing engine 114 receives an objectlinked augmentation effect control 310 and a filter effect control 308to generate a transformed image stream 314 displayable through thedisplay device 102. The object linked augmentation effect control 310 isutilized to remove the sign with the QR code from the image stream. Thismay be done by replacing the area of the signa with a composite imageapproximating the background. The filter effect control 308 is utilizedto adjust the color balance of the object environment 312 making themountains appear darker.

In FIG. 4, a method 400 captures an imaging area comprising at least oneobject as an image stream through operation of an image sensor (block402). In block 404, the method 400 recognizes the at least one object inthe image stream through operation of an object detection engine. Inblock 406, the method 400 communicates at least one correlated imageeffect control to an image processing engine, in response to the atleast one object comprising an optical label. In block 408, the method400 communicates at least one matched image effect control to the imageprocessing engine, in response to receiving at least a labeled imagestream at an image effect matching algorithm from the object detectionengine. In block 410, the method 400 generates a transformed imagestream displayable through a display device by applying at least oneimage effect control to the image stream through operation of the imageprocessing engine.

In FIG. 5, a method 500 receives an image stream from the image sensorat an object detection engine (block 502). In block 504, the method 500detects an optical label in the image stream and extracts an embeddedidentifier from the optical label through operation of the objectdetection engine. In block 506, the method 500 identifies the at leastone image effect control in an image effects repository throughcomparison of the embedded identifier to optical label identifiers in acorrelation table through operation of an optical label correlator.

In FIG. 6, a method 600 receives an image stream from an image sensor atan object detection engine (block 602). In block 604, the method 600detects at least one recognized object in the image stream throughoperation of the object detection engine. In block 606, the method 600generates a labeled image stream. In subroutine block 608, the method600 identifies each recognized object within the image stream with anobject classification label. In subroutine block 610, the method 600identifies an object environment and object arrangement within the imagestream as a scene with a scene classification label. In block 612, themethod 600 operates the image effect matching algorithm to match theoptical label, the scene classification label, object classificationlabel, and combinations thereof with at least one corresponding imageeffect control from an image effects repository.

In some configurations the image effect matching algorithm may beoperated in accordance with the following code listing:

function find_effect_through_object_and_qr_code_match(camera_feed){ //query format  var prepared_query = {objects: [ ], scenes: [ ], qr_code:[ ]}; // detect objects and qr codes  prepared_query =object_and_qr_detection(camera_feed); //look up all effects matchingobjects, scenes or qr code  results = database_execute(“select”,“* fromeffects where objects  include ? or scenes include ? qr_code = ? groupedby object”,  prepared_query.objects,prepared_query.scenes,prepared_query.qr_code); // gather and sort effect by matching objects var object_based_effects = results.sort(function(a,b){a.objects.count - b.objects.count}); // gather and sort effect by matching scenes  varscene_based_effects = results.sort(function(a,b){a.scenes.count - b.scenes.count}); // gather qr effect if there is a match  varqr_based_effects = results.qr_code_effect;  return { object_based_effects: object_based_effects,  scene_based_effects:scene_based_effects,  qr_based_effects: qr_based_effects  } }

In one example, a camera detects a scene with a green tree and a carunder the tree and sunny weather. The matching algorithm looks uppossible adjustment of green tress under sunny weather as well as carsunder sunny weather. The algorithm may find a hue enhancement tocompensate saturation lost by the tree, as well as a paint colorenhancement for the cars. The algorithm returns these object specificmanipulation back to the image processing engine which renders theeffects to each object—in this case, the tree and the car.

In another example, the camera detects a QR code in the scene. Thematching algorithm looks up the effect linked to the QR code object, andreturns the effects to the image processing engine which renders theeffect to the whole image. In this case, the QR code is placed in arestaurant that serves noodles, and the effects is around color,lighting and contrast enhancement for noodle related food.

In another example, the camera detects a person holding a red flower ina dark indoor scene. The matching algorithm looks up possible adjustmentof the person in the dark indoor scene, as well as flowers in a darkindoor scene. The algorithm found adjustments for the person and flower,returns these adjustments to the image processing engine that rendersthe effects directly to each object—in this case person, and flower.

The system may provide configuration options and settings that allowblending the image control effects with the image stream based on apercentage from 0% percent to 100%. This may allow the user to determinehow strong the effects look. At 0% all effects are turned off, at 50%the effects are blended equally with the original image, and at 100% theeffects are at full rendition to the image.

In some configurations, the database of entries and objects for theobject classifications and image control effects may be the same to allusers, and the look up may be a simple object and entry look up.

The system may also combine the size, orientation of the object, thegeolocation of the photo, time, and season the photo was taken togetherwith the match. If there is no exact match, the system may determine theclosest matching settings, for example, if there is no big green treewith effects, the system may drop the green or big object modifiers inthe search until an image control effect is found.

The database may also be more specific or tailored to specific users.The database may be locally built and catered to how each user isediting photos manually, so entries created through the photo editingprocess are stored associated with that particular user. The look upphase may be for identifying the most popular edit for certain objects.For example, if a user edits a photo and 90% of the time, they increasethe green hue of a tree by +2, then the result match should be anincrease in the green hue of the tree +2.

Instead of the most popular, they system may provide users averages forspecific settings in certain conditions. For example, 3 out of 5 times auser did +1 to a color channel value and 2 out of 5 times they did +2,the result match would be +7/5 for that particular adjustment.

FIG. 7 illustrates an exemplary convolutional neural network 700 usefulfor the apparatuses and methods of the disclosure. The convolutionalneural network 700 arranges its neurons in three dimensions (width,height, depth), as visualized in convolutional layer 704. Every layer ofthe convolutional neural network 700 transforms a 3D volume of inputs toa 3D output volume of neuron activations. In this example, the inputlayer 702 encodes the image, so its width and height would be thedimensions of the image, and the depth would be 3 (Red, Green, Bluechannels). The convolutional layer 704 further transforms the outputs ofthe input layer 702, and the output layer 706 transforms the outputs ofthe convolutional layer 704 into one or more classifications of theimage content.

FIG. 8 illustrates an exemplary convolutional neural network layers 800in more detail. An example subregion of the input layer region 804 of aninput layer region 802 region of an image is analyzed by a set ofconvolutional layer subregion 808 in the convolutional layer 806. Theinput layer region 802 is 32×32 neurons long and wide (e.g., 32×32pixels), and three neurons deep (e.g., three color channels per pixel).Each neuron in the convolutional layer 806 is connected only to a localregion in the input layer region 802 spatially (in height and width),but to the full depth (i.e. all color channels if the input is animage). Note, there are multiple neurons (5 in this example) along thedepth of the convolutional layer subregion 808 that analyzes thesubregion of the input layer region 804 of the input layer region 802,in which each neuron of the convolutional layer subregion 808 mayreceive inputs from every neuron of the subregion of the input layerregion 804.

FIG. 9 illustrates a popular form of a CNN known as a VGG net 900. Theinitial convolution layer 902 stores the raw image pixels and the finalpooling layer 920 determines the class scores. Each of the intermediateconvolution layers (convolution layer 906, convolution layer 912, andconvolution layer 916) and rectifier activations (RELU layer 904,RELUlayer 908, RELUlayer 914, and RELUlayer 918) and intermediatepooling layers (pooling layer 910, pooling layer 920) along theprocessing path is shown as a column.

The VGG net 900 replaces the large single-layer filters of basic CNNswith multiple 3×3 sized filters in series. With a given receptive field(the effective area size of input image on which output depends),multiple stacked smaller size filters may perform better at imagefeature classification than a single layer with a larger filter size,because multiple non-linear layers increase the depth of the networkwhich enables it to learn more complex features. In a VGG net 900 eachpooling layer may be only 2×2.

FIG. 10 illustrates a convolution layer filtering 1000 that connects theoutputs from groups of neurons in a convolution layer 1002 to neurons ina next layer 1006. A receptive field is defined for the convolutionlayer 1002, in this example sets of 5×5 neurons. The collective outputsof each neuron the receptive field are weighted and mapped to a singleneuron in the next layer 1006. This weighted mapping is referred to asthe filter 1004 for the convolution layer 1002 (or sometimes referred toas the kernel of the convolution layer 1002). The filter 1004 depth isnot illustrated in this example (i.e., the filter 1004 is actually acubic volume of neurons in the convolution layer 1002, not a square asillustrated). Thus, what is shown is a “slice” of the full filter 1004.The filter 1004 is slid, or convolved, around the input image, each timemapping to a different neuron in the next layer 1006. For example FIG.10 shows how the filter 1004 is stepped to the right by 1 unit (the“stride”), creating a slightly offset receptive field from the top one,and mapping its output to the next neuron in the next layer 1006. Thestride can be and often is other numbers besides one, with largerstrides reducing the overlaps in the receptive fields, and hence furtherreducing the size of the next layer 1006. Every unique receptive fieldin the convolution layer 1002 that can be defined in this stepwisemanner maps to a different neuron in the next layer 1006. Thus, if theconvolution layer 1002 is 32×32×3 neurons per slice, the next layer 1006need only be 28×28×1 neurons to cover all the receptive fields of theconvolution layer 1002. This is referred to as an activation map orfeature map. There is thus a reduction in layer complexity from thefiltering. There are 784 different ways that a 5×5 filter can uniquelyfit on a 32×32 convolution layer 1002, so the next layer 1006 need onlybe 28×28. The depth of the convolution layer 1002 is also reduced from 3to 1 in the next layer 1006.

The number of total layers to use in a CNN, the number of convolutionlayers, the filter sizes, and the values for strides at each layer areexamples of “hyperparameters” of the CNN.

FIG. 11 illustrates a pooling layer function 1100 with a 2×2 receptivefield and a stride of two. The pooling layer function 1100 is an exampleof the maxpool pooling technique. The outputs of all the neurons in aparticular receptive field of the input layer 1102 are replaced by themaximum valued one of those outputs in the pooling layer 1104. Otheroptions for pooling layers are average pooling and L2-norm pooling. Thereason to use a pooling layer is that once a specific feature isrecognized in the original input volume (there will be a high activationvalue), its exact location is not as important as its relative locationto the other features. Pooling layers can drastically reduce the spatialdimension of the input layer 1102 from that point forward in the neuralnetwork (the length and the width change but not the depth). This servestwo main purposes. The first is that the amount of parameters or weightsis greatly reduced thus lessening the computation cost. The second isthat it will control overfitting. Overfitting refers to when a model isso tuned to the training examples that it is not able to generalize wellwhen applied to live data sets.

FIG. 12 illustrates a functional block diagram of an embodiment of ARdevice logic 1200. The AR device logic 1200 comprises the followingfunctional modules: a rendering engine 1216, local augmentation logic1214, local modeling logic 1208, device tracking logic 1206, an encoder1212, and a decoder 1220. Each of these functional modules may beimplemented in software, dedicated hardware, firmware, or a combinationof these logic types.

The rendering engine 1216 controls the graphics engine 1218 to generatea stereoscopic image visible to the wearer, i.e. to generate slightlydifferent images that are projected onto different eyes by the opticalcomponents of a headset substantially simultaneously, so as to createthe impression of 3D structure.

The stereoscopic image is formed by rendering engine 1216 rendering atleast one virtual display element (“augmentation”), which is perceivedas a 3D element, i.e. having perceived 3D structure, at a real-worldlocation in 3D space by the user.

An augmentation is defined by an augmentation object stored in thememory 1202. The augmentation object comprises: location data defining adesired location in 3D space for the virtual element (e.g. as (x,y,z)Cartesian coordinates); structural data defining 3D surface structure ofthe virtual element, i.e. a 3D model of the virtual element; and imagedata defining 2D surface texture of the virtual element to be applied tothe surfaces defined by the 3D model. The augmentation object maycomprise additional information, such as a desired orientation of theaugmentation.

The perceived 3D effects are achieved though suitable rendering of theaugmentation object. To give the impression of the augmentation having3D structure, a stereoscopic image is generated based on the 2D surfaceand 3D augmentation model data in the data object, with the augmentationbeing rendered to appear at the desired location in the stereoscopicimage.

A 3D model of a physical object is used to give the impression of thereal-world having expected tangible effects on the augmentation, in theway that it would a real-world object. The 3D model represents structurepresent in the real world, and the information it provides about thisstructure allows an augmentation to be displayed as though it were areal-world 3D object, thereby providing an immersive augmented realityexperience. The 3D model is in the form of 3D mesh.

For example, based on the model of the real-world, an impression can begiven of the augmentation being obscured by a real-world object that isin front of its perceived location from the perspective of the user;dynamically interacting with a real-world object, e.g. by moving aroundthe object; statically interacting with a real-world object, say bysitting on top of it etc.

Whether or not real-world structure should affect an augmentation can bedetermined based on suitable rendering criteria. For example, bycreating a 3D model of the perceived AR world, which includes thereal-world surface structure and any augmentations, and projecting itonto a plane along the AR user's line of sight as determined using posetracking (see below), a suitable criteria for determining whether areal-world object should be perceived as partially obscuring anaugmentation is whether the projection of the real-world object in theplane overlaps with the projection of the augmentation, which could befurther refined to account for transparent or opaque real worldstructures. Generally, the criteria can depend on the location and/ororientation of the augmented reality device and/or the real-worldstructure in question.

An augmentation can also be mapped to the mesh, in the sense that itsdesired location and/or orientation is defined relative to a certainstructure(s) in the mesh. Should that structure move and/or rotatecausing a corresponding change in the mesh, when rendered properly thiswill cause corresponding change in the location and/or orientation ofthe augmentation. For example, the desired location of an augmentationmay be on, and defined relative to, a table top structure; should thetable be moved, the augmentation moves with it. Object recognition canbe used to this end, for example to recognize a known shape of table andthereby detect when the table has moved using its recognizablestructure. Such object recognition techniques are known in the art.

An augmentation that is mapped to the mash in this manner or isotherwise associated with a particular piece of surface structureembodied in a 3D model, is referred to an “annotation” to that piece ofsurface structure. In order to annotate a piece of real-world surfacestructure, it is necessary to have that surface structure represented bythe 3D model in question—without this, the real-world structure cannotbe annotated.

The local modeling logic 1208 generates a local 3D model “LM” of theenvironment in the memory 1202, using the AR device's own sensor(s) e.g.cameras 1210 and/or any dedicated depth sensors etc. The local modelinglogic 1208 and sensor(s) constitute sensing apparatus.

The device tracking logic 1206 tracks the location and orientation ofthe AR device, e.g. a headset, using local sensor readings captured fromthe AR device. The sensor readings can be captured in a number of ways,for example using the cameras 1210 and/or other sensor(s) such asaccelerometers. The device tracking logic 1206 determines the currentlocation and orientation of the AR device and provides this informationto the rendering engine 1216, for example by outputting a current “posevector” of the AR device. The pose vector is a six dimensional vector,for example (x, y, z, P, R, Y) where (x,y,z) are the device's Cartesiancoordinates with respect to a suitable origin, and (P, R, Y) are thedevice's pitch, roll and yaw with respect to suitable reference axes.

The rendering engine 1216 adapts the local model based on the tracking,to account for the movement of the device i.e. to maintain theperception of the as 3D elements occupying the real-world, for exampleto ensure that static augmentations appear to remain static (which willin fact be achieved by scaling or rotating them as, from the AR user'sperspective, the environment is moving relative to them).

The encoder 1212 receives image data from the cameras 1210 and audiodata from the microphones 1204 and possibly other types of data (e.g.,annotation or text generated by the user of the AR device using thelocal augmentation logic 1214) and transmits that information to otherdevices, for example the devices of collaborators in the AR environment.The decoder 1220 receives an incoming data stream from other devices,and extracts audio, video, and possibly other types of data (e.g.,annotations, text) therefrom.

FIG. 13 illustrates more aspects of an AR device 1300 according to oneembodiment. The AR device 1300 comprises processing units 1302, inputdevices 1304, memory 1306, output devices 1308, storage devices 1310, anetwork interface 1312, and various logic to carry out the processesdisclosed herein.

The input devices 1304 comprise transducers that convert physicalphenomenon into machine internal signals, typically electrical, opticalor magnetic signals. Signals may also be wireless in the form ofelectromagnetic radiation in the radio frequency (RF) range but alsopotentially in the infrared or optical range. Examples of input devices1304 are keyboards which respond to touch or physical pressure from anobject or proximity of an object to a surface, mice which respond tomotion through space or across a plane, microphones which convertvibrations in the medium (typically air) into device signals, scannerswhich convert optical patterns on two or three dimensional objects intodevice signals. The signals from the input devices 1304 are provided viavarious machine signal conductors (e.g., busses or network interfaces)and circuits to memory 1306.

The memory 1306 provides for storage (via configuration of matter orstates of matter) of signals received from the input devices 1304,instructions and information for controlling operation of the processingunits 1302, and signals from storage devices 1310. The memory 1306 mayin fact comprise multiple memory devices of different types, for examplerandom access memory devices and non-volatile (e.g., FLASH memory)devices. in

Information stored in the memory 1306 is typically directly accessibleto the processing units 1302 of the device. Signals input to the ARdevice 1300 cause the reconfiguration of the internal material/energystate of the memory 1306, creating logic that in essence forms a newmachine configuration, influencing the behavior of the AR device 1300 byaffecting the behavior of the processing units 1302 with control signals(instructions) and data provided in conjunction with the controlsignals. In the AR device 1300, the memory 1306 comprises logic 1314,logic 1316, logic 1318, and logic 1320.

The storage devices 1310 may provide a slower but higher capacitymachine memory capability. Examples of storage devices 1310 are harddisks, optical disks, large capacity flash memories or othernon-volatile memory technologies, and magnetic memories.

The processing units 1302 may cause the configuration of the memory 1306to be altered by signals in the storage devices 1310. In other words,the processing units 1302 may cause data and instructions to be readfrom storage devices 1310 in the memory 1306 from which may theninfluence the operations of processing units 1302 as instructions anddata signals, and from which it may also be provided to the outputdevices 1308. The processing units 1302 may alter the content of thememory 1306 by signaling to a machine interface of memory 1306 to alterthe internal configuration, and then converted signals to the storagedevices 1310 to alter its material internal configuration. In otherwords, data and instructions may be backed up from memory 1306, which isoften volatile, to storage devices 1310, which are often non-volatile.

Output devices 1308 are transducers which convert signals received fromthe memory 1306 into physical phenomenon such as vibrations in the air,or patterns of light on a machine display, or vibrations (i.e., hapticdevices) or patterns of ink or other materials (i.e., printers and 3-Dprinters).

The network interface 1312 receives signals from the memory 1306 orprocessing units 1302 and converts them into electrical, optical, orwireless signals to other machines, typically via a machine network. Thenetwork interface 1312 also receives signals from the machine networkand converts them into electrical, optical, or wireless signals to thememory 1306 or processing units 1302.

FIG. 14 illustrates components of an exemplary augmented reality devicelogic 1400. The augmented reality device logic 1400 comprises a graphicsengine 1402, cameras 1416, processing units 1408, including one or moreCPU 1410 and/or GPU 1412, a WiFi 1414 wireless interface, a Bluetooth1418 wireless interface, speakers 1422, microphones 1404, and one ormore memory 1406.

The processing units 1408 may in some cases comprise programmabledevices such as bespoke processing units optimized for a particularfunction, such as AR related functions. The augmented reality devicelogic 1400 may comprise other components that are not shown, such asdedicated depth sensors, additional interfaces etc.

Some or all of the components in FIG. 14 may be housed in an AR headset.In some embodiments, some of these components may be housed in aseparate housing connected or in wireless communication with thecomponents of the AR headset. For example, a separate housing for somecomponents may be designed to be worn or a belt or to fit in thewearer's pocket, or one or more of the components may be housed in aseparate computer device (smartphone, tablet, laptop or desktop computeretc.) which communicates wirelessly with the display and cameraapparatus in the AR headset, whereby the headset and separate deviceconstitute the full augmented reality device logic 1400.

The memory 1202 comprises logic 1420 to be applied to the processingunits 1408 to execute. In some cases, different parts of the logic 1420may be executed by different components of the processing units 1408.The logic 1420 typically comprises code of an operating system, as wellas code of one or more applications configured to run on the operatingsystem to carry out aspects of the processes disclosed herein.

Implementations and Terminology

Terms used herein should be accorded their ordinary meaning in therelevant arts, or the meaning indicated by their use in context, but ifan express definition is provided, that meaning controls.

“Circuitry” in this context refers to electrical circuitry having atleast one discrete electrical circuit, electrical circuitry having atleast one integrated circuit, electrical circuitry having at least oneapplication specific integrated circuit, circuitry forming a generalpurpose computing device configured by a computer program (e.g., ageneral purpose computer configured by a computer program which at leastpartially carries out processes or devices described herein, or amicroprocessor configured by a computer program which at least partiallycarries out processes or devices described herein), circuitry forming amemory device (e.g., forms of random access memory), or circuitryforming a communications device (e.g., a modem, communications switch,or optical-electrical equipment).

“Firmware” in this context refers to software logic embodied asprocessor-executable instructions stored in read-only memories or media.

“Hardware” in this context refers to logic embodied as analog or digitalcircuitry.

“Logic” in this context refers to machine memory circuits,non-transitory machine readable media, and/or circuitry which by way ofits material and/or material-energy configuration comprises controland/or procedural signals, and/or settings and values (such asresistance, impedance, capacitance, inductance, current/voltage ratings,etc.), that may be applied to influence the operation of a device.Magnetic media, electronic circuits, electrical and optical memory (bothvolatile and nonvolatile), and firmware are examples of logic. Logicspecifically excludes pure signals or software per se (however does notexclude machine memories comprising software and thereby formingconfigurations of matter).

“Software” in this context refers to logic implemented asprocessor-executable instructions in a machine memory (e.g. read/writevolatile or nonvolatile memory or media).

Herein, references to “one embodiment” or “an embodiment” do notnecessarily refer to the same embodiment, although they may. Unless thecontext clearly requires otherwise, throughout the description and theclaims, the words “comprise,” “comprising,” and the like are to beconstrued in an inclusive sense as opposed to an exclusive or exhaustivesense; that is to say, in the sense of “including, but not limited to.”Words using the singular or plural number also include the plural orsingular number respectively, unless expressly limited to a single oneor multiple ones. Additionally, the words “herein,” “above,” “below” andwords of similar import, when used in this application, refer to thisapplication as a whole and not to any particular portions of thisapplication. When the claims use the word “or” in reference to a list oftwo or more items, that word covers all of the following interpretationsof the word: any of the items in the list, all of the items in the listand any combination of the items in the list, unless expressly limitedto one or the other. Any terms not expressly defined herein have theirconventional meaning as commonly understood by those having skill in therelevant art(s).

Various logic functional operations described herein may be implementedin logic that is referred to using a noun or noun phrase reflecting saidoperation or function. For example, an association operation may becarried out by an “associator” or “correlator”. Likewise, switching maybe carried out by a “switch”, selection by a “selector”, and so on.

What is claimed is:
 1. A method of applying an image effect based onrecognized objects, the method comprising: capturing an imaging areacomprising at least one object as an image stream through operation ofan image sensor; recognizing the at least one object in the image streamthrough operation of an object detection engine; communicating at leastone correlated image effect control to an image processing engine, inresponse to the at least one object comprising an optical label;communicating at least one matched image effect control to the imageprocessing engine, in response to receiving a labeled image stream at animage effect matching algorithm from the object detection engine; andgenerating a transformed image stream displayable through a displaydevice by applying at least one image effect control to the image streamthrough operation of the image processing engine.
 2. The method of claim1, wherein the object detection engine is a trained artificial neuralnetwork for recognizing the at least one object in the image stream. 3.The method of claim 1 further comprising: receiving the image streamfrom the image sensor at the object detection engine; detecting theoptical label in the image stream and extracting an embedded identifierfrom the optical label through operation of the object detection engine;and identifying the at least one image effect control in an imageeffects repository through comparison of the embedded identifier tooptical label identifiers in a correlation table through operation of anoptical label correlator.
 4. The method of claim 1 further comprising:receiving the image stream from the image sensor at the object detectionengine; detecting at least one recognized object in the image streamthrough operation of the object detection engine; generating the labeledimage stream by: identifying each recognized object within the imagestream with an object classification label; and identifying objectenvironment and object arrangement within the image stream as a scenewith a scene classification label; and operating the image effectmatching algorithm to match at least one of the optical label, the sceneclassification label, object classification label, and combinationsthereof with at least one corresponding image effect control from animage effects repository.
 5. The method of claim 4 wherein the imageeffect matching algorithm utilizes a trained artificial neural networkto match at least one of the optical label, the scene classificationlabel, object classification label, and combinations thereof with the atleast one corresponding image effect control from the image effectsrepository.
 6. The method of claim 1, wherein the at least one imageeffect control is at least one of a filter effect control, an objectlinked augmentation effect control, a distortion effect control, anoverlay control, and combinations thereof.
 7. A non-transitorycomputer-readable storage medium, the computer-readable storage mediumincluding instructions that when executed by a computer, cause thecomputer to: capture an imaging area comprising at least one object asan image stream through operation of an image sensor; recognize the atleast one object in the image stream through operation of an objectdetection engine; communicate at least one correlated image effectcontrol to an image processing engine, in response to the at least oneobject comprising an optical label; communicate at least one matchedimage effect control to the image processing engine, in response toreceiving a labeled image stream at an image effect matching algorithmfrom the object detection engine; and generate a transformed imagestream displayable through a display device by applying at least oneimage effect control to the image stream through operation of the imageprocessing engine.
 8. The computer-readable storage medium of claim 7,wherein the object detection engine is a trained artificial neuralnetwork for recognizing the at least one object in the image stream. 9.The computer-readable storage medium of claim 7 wherein the instructionsfurther configure the computer to: receive the image stream from theimage sensor at the object detection engine; detect the optical label inthe image stream and extracting an embedded identifier from the opticallabel through operation of the object detection engine; and identify theat least one image effect control in an image effects repository throughcomparison of the embedded identifier to optical label identifiers in acorrelation table through operation of an optical label correlator. 10.The computer-readable storage medium of claim 7 wherein the instructionsfurther configure the computer to: receive the image stream from theimage sensor at the object detection engine; detect at least onerecognized object in the image stream through operation of the objectdetection engine; generate the labeled image stream by: identify eachrecognized object within the image stream with an object classificationlabel; and identify object environment and object arrangement within theimage stream as a scene with a scene classification label; and operatethe image effect matching algorithm to match at least one of the opticallabel, the scene classification label, object classification label, andcombinations thereof with at least one corresponding image effectcontrol from an image effects repository.
 11. The computer-readablestorage medium of claim 10 wherein the image effect matching algorithmutilizes a trained artificial neural network to match at least one ofthe optical label, the scene classification label, object classificationlabel, and combinations thereof with the at least one correspondingimage effect control from the image effects repository.
 12. Thecomputer-readable storage medium of claim 7, wherein the at least oneimage effect control is at least one of a filter effect control, anobject linked augmentation effect control, a distortion effect control,an overlay control, and combinations thereof.
 13. A computing apparatus,the computing apparatus comprising: a processor; and a memory storinginstructions that, when executed by the processor, configure theapparatus to: capture an imaging area comprising at least one object asan image stream through operation of an image sensor; recognize the atleast one object in the image stream through operation of an objectdetection engine; communicate at least one correlated image effectcontrol to an image processing engine, in response to the at least oneobject comprising an optical label; communicate at least one matchedimage effect control to the image processing engine, in response toreceiving a labeled image stream at an image effect matching algorithmfrom the object detection engine; and generate a transformed imagestream displayable through a display device by applying at least oneimage effect control to the image stream through operation of the imageprocessing engine.
 14. The computing apparatus of claim 13, wherein theobject detection engine is a trained artificial neural network forrecognizing the at least one object in the image stream.
 15. Thecomputing apparatus of claim 13 wherein the instructions furtherconfigure the apparatus to: receive the image stream from the imagesensor at the object detection engine; detect the optical label in theimage stream and extracting an embedded identifier from the opticallabel through operation of the object detection engine; and identify theat least one image effect control in an image effects repository throughcomparison of the embedded identifier to optical label identifiers in acorrelation table through operation of an optical label correlator. 16.The computing apparatus of claim 13 wherein the instructions furtherconfigure the apparatus to: receive the image stream from the imagesensor at the object detection engine; detect at least one recognizedobject in the image stream through operation of the object detectionengine; generate the labeled image stream by: identify each recognizedobject within the image stream with an object classification label; andidentify object environment and object arrangement within the imagestream as a scene with a scene classification label; and operate theimage effect matching algorithm to match at least one of the opticallabel, the scene classification label, object classification label, andcombinations thereof with at least one corresponding image effectcontrol from an image effects repository.
 17. The computing apparatus ofclaim 16 wherein the image effect matching algorithm utilizes a trainedartificial neural network to match at least one of the optical label,the scene classification label, object classification label, andcombinations thereof with the at least one corresponding image effectcontrol from the image effects repository.
 18. The computing apparatusof claim 13, wherein the at least one image effect control is at leastone of a filter effect control, an object linked augmentation effectcontrol, a distortion effect control, an overlay control, andcombinations thereof.